European Radiology
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match European Radiology's content profile, based on 14 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Harrison, C. A.; Wu, M.; White, O.; Hopkinson, G.; Hughes, J.; Robertson, S.; Scurr, E.; Shur, J.; Castagnoli, F.; Charles-Edwards, G.; Koh, D.-M.; Winfield, J.
Show abstract
Objectives: AI-based reconstructions can reduce MRI acquisition times and/or improve image quality. Guidelines recommend clinical evaluations and post-deployment monitoring of these novel methods, however, there has been little investigation of the clinical resources required for such assessments. The aim of this study was to evaluate the healthcare resource utilisation and potential savings associated with AI-based reconstructions in rectal MRI. Methods: A retrospective economic costing analysis was conducted from the NHS healthcare perspective. Resource utilisation data were extracted from the Electronic Patient Records for 9 healthy volunteer scans and 104 rectal MRI examinations evaluating an AI-based reconstruction. The resource profile included the MRI scan and the staff time required for data acquisition and analysis. Results: The clinical evaluation of the AI-based reconstruction cost {pound}15,023. Deployment of the AI-based reconstruction reduced the length of an MRI rectum scan by 22 minutes, theoretically saving approximately {pound}3,437 per month. Addition of post-deployment quality control scans reduced this monthly saving to {pound}2,636. If the quality control scans were evaluated using radiologists rather than image quality metrics, monthly savings would be approximately {pound}2,541. With ongoing quality control, the clinical evaluation cost would be recouped between 5.8 and 6 months, compared with 4.4 months without ongoing quality control. Conclusions: Deploying AI-based reconstructions can yield cost savings through reduced scanning times. Quality control tests using image quality metrics would save radiological burden and reduce costs compared with conducting repeated image scoring by radiologists.
Chaves, E. T.; Teunis, J. T.; Digmayer Romero, V. H.; van Nistelrooij, N.; Vinayahalingam, S.; Sezen-Hulsmans, D.; Mendes, F. M.; Huysmans, M.-C.; Cenci, M. S.; Lima, G. d. S.
Show abstract
BackgroundRadiographic detection of caries lesions adjacent to restorations is challenging due to limitations of two-dimensional imaging and difficulties distinguishing true lesions from restorative or anatomical radiolucencies. Artificial intelligence (AI)-based clinical decision support systems (CDSSs) have been introduced to assist radiographic interpretation; however, different AI tools may yield variable diagnostic outputs, and their comparative performance remains unclear. ObjectiveTo compare the diagnostic performance of commercial and experimental AI algorithms for detecting secondary caries lesions on bitewings. MethodsThis cross-sectional diagnostic accuracy study included 200 anonymized bitewings comprising 885 restored tooth surfaces. A consensus group reference standard identified all surfaces with a caries lesion and classified each lesion by type (primary/secondary) and depth (enamel-only/dentin-involved). Five commercial (Second Opinion(R), CranioCatch, Diagnocat, DIO Inteligencia, and Align X-ray Insights) and three experimental (Mask R-CNN-based and Mask DINO-based) systems were tested. Diagnostic performance was expressed through sensitivity, specificity, and overall accuracy (95% CI). Comparisons used generalized estimating equations, adjusted for clustered data. ResultsSpecificity was high across all systems (0.957-0.986), confirming accurate recognition of non-carious surfaces, whereas sensitivity was moderate (0.327-0.487), reflecting frequent missed detections of enamel and dentin lesions. Accuracy ranged from 0.882 to 0.917, with no significant differences among models (p [≥] 0.05). Confounding factors, such as radiographic overlapping, marginal restoration defects, and cervical artifacts, were the main sources of misclassification. ConclusionsAI algorithms, regardless of architecture or commercial status, showed similar diagnostic capabilities and a conservative detection profile, favoring specificity over sensitivity. Improvements in dataset diversity, labeling precision, and explainability may further enhance reliability for secondary caries detection. Clinical SignificanceAI-based CDSSs assist clinicians by providing consistent detection. Their high specificity is particularly valuable in minimizing unnecessary invasive treatments (overtreatment), though they should be used as adjuncts rather than a replacement for expert judgment.
Nishio, M.; Matsuo, H.; Matsunaga, T.; Fujimoto, K.; Deperrois, N.; Nooralahzadeh, F.; Frauenfelder, T.; Krauthammer, M.; Murakami, T.
Show abstract
Background and Objectives: The ability of vision-language models (VLMs) to detect lung nodules on chest radiographs remains uncertain. This retrospective study aimed to compare the zero-shot performances of six VLMs for lung nodule detection using data from the Japanese Society of Radiological Technology (JSRT) chest radiograph database. Methods: A total of 247 chest radiographs from the JSRT database (154 with nodules and 93 without) were preprocessed and evaluated using six VLMs: RadVLM, gpt-4o-mini, Qwen3-VL-8B-Instruct, MedGemma-4b-it, LLaVA-Rad, and CheXpert Plus Model. Each model was tested using a zero-shot setting. The text outputs were binarized into nodule-present or nodule-absent labels by consensus between the two radiologists. Sensitivity, specificity, accuracy, precision, and F1 scores were calculated. Pairwise differences in sensitivity, specificity, and accuracy were assessed using McNemar test with Holm correction. Results: The overall performance was limited across all models. RadVLM achieved the highest accuracy (44.5%, 110/247) with perfect specificity (100.0%, 93/93) and precision (100.0%); however, its sensitivity was low (11.0%, 17/154). LLaVA-Rad showed the highest sensitivity (27.3%, 42/154) and F1 score (37.7%), but lower specificity (71.0%, 66/93). MedGemma-4b-it achieved 100.0% specificity, with a sensitivity of only 5.2% (8/154). Grade-specific analysis showed that detection rates were highest for obvious nodules and remained limited for subtle nodules. Pairwise analyses revealed significant differences in sensitivity and specificity for the selected model pairs, particularly between RadVLM and LLaVA-Rad. Conclusion: Current VLMs show limited zero-shot generalizability for lung nodule detection in the JSRT database, with marked trade-offs between sensitivity and specificity. Their near-term value may lie more in radiologist-assisted workflows than in stand-alone detection. Clinical Impact: Current VLMs should not be used as stand-alone tools for lung nodule detection on chest radiographs because of their limited sensitivity and substantial model-dependent trade-offs. However, their high-specificity outputs in some models and higher-sensitivity behavior in others suggest potential roles in radiologist-assisted workflows, such as report drafting and second-reader support.
Ueda, Y.; Okazaki, T.; Isome, H.; Patel, A.; Ichimasa, T.; Asaumi, R.; Kawai, T.; Suyama, K.; Hayashi, S.
Show abstract
BackgroundVertebral artery calcification (VAC), a critical indicator of cerebrovascular disease, is often overlooked in head-and-neck imaging. Manual detection is time-consuming and prone to inter-observer variability. This study aimed to develop and validate a deep learning model for automated detection and quantitative risk assessment of VAC in non-contrast head-and-neck computed tomography (CT) images, bridging the diagnostic gap between dentistry and vascular medicine. MethodsWe developed a deep learning model based on the ResNet-18 architecture, designated as Grayscale ResNet, optimized for single-channel CT images. The development followed a two-phase strategy: initial training on 539 axial images from head-and-neck CT image followed by iterative refinement (fine-tuning) using a targeted dataset of clinically significant cases to ensure generalizability. The models performance was evaluated using patient-level Receiver Operating Characteristic (ROC) analysis and saliency map visualization for clinical interpretability. ResultsThe optimized model demonstrated a robust performance in distinguishing between cases with and without VAC. In the independent cohort, the model achieved an area under the curve (AUC) of 0.846. At a specific threshold value (98.6%), the system yielded a sensitivity of 80.0% and a specificity of 90.6%. A saliency map analysis confirmed that the model consistently focused on anatomically relevant vascular regions. ConclusionsThe proposed automated system provides an accurate and reliable method for VAC screening using routine head-and-neck CT scans. By transforming incidental imaging findings into a quantifiable risk index, this tool can serve as a vital decision-support system for dentists and radiologists, facilitating early patient referrals and contributing to global stroke prevention.
Wu, X.; Zhang, J.; He, Y.; Zhang, Y.; Kang, X.; Hu, W.; Li, Y.; Ma, H.; Wang, Y.; Song, Y.; Chen, X.; Huo, F.; Zhang, Y.; Yin, H.; Xi, Y.
Show abstract
Background: Traditional bone scintigraphy for detecting malignant bone metastases is limited by suboptimal accuracy and radiation exposure. Whole-body magnetic resonance imaging (WB-MRI), while an alternative, requires lengthy scan times and high patient compliance. Purpose: To develop a novel, rapid whole body bone screening (WB-RBS) MRI protocol and evaluate its diagnostic performance for bone metastasis detection. Materials and Methods: Patients with pathologically confirmed malignancies and healthy controls were prospectively enrolled. All participants underwent WB-RBS (acquisition time: about 10 min); patients additionally underwent WB-MRI (about 70 min). Three radiologists, blinded to clinical data, independently evaluated the images for bone metastases. A consensus expert diagnosis served as the reference standard to calculate the diagnostic performance of WB-RBS. Specificity was further assessed in the healthy control group. Results: Seventy patients and 19 healthy controls were included. WB-RBS demonstrated excellent inter-reader agreement at the patient level. Compared with the reference standard, WB-RBS achieved an accuracy of 77.1%-91.4% at the patient level and a slightly lower accuracy (70.6%-82.5%) at the lesion level. At diagnostic confidence thresholds 1-3, the correlations between WB-RBS ratings and the reference standard were statistically significant for both patient- and lesion-level analyses. Conclusion: WB-RBS showed favorable inter-reader agreement and high accuracy for bone metastasis screening at the patient level, while substantially reducing scan time and cost. Its rapid, radiation-free nature and high accessibility offer distinct clinical advantages, supporting its potential as an alternative screening tool to conventional bone scintigraphy.
Lorenz, D.; Jansen, S.; Knoche, J.; Wolf-Sebottendorff, R.; Awad, H. J.; Toker, I.
Show abstract
Background. Guided structured reporting has been proposed to address the limited availability of structured data in radiology, yet empirical evidence on its real-world adoption across users and imaging modalities remains scarce. Objective. To describe the adoption dynamics of a guided structured reporting system across multiple users and imaging modalities during a six-week implementation period. Methods. Retrospective observational study at two public tertiary hospitals in Abu Dhabi, United Arab Emirates. A guided structured reporting system was deployed for computed tomography (CT), magnetic resonance imaging (MRI), and mammography. Seven radiologists participated. The primary outcome was active in-software reporting time, recorded via system logs of mouse and keyboard interaction. Temporal trends in median reporting time per modality and individual user trajectories were analysed descriptively. After predefined data cleaning, 126 reports were included (84 CT, 27 MRI, 15 mammography). Results. Active in-software reporting time decreased across all modalities. Median reporting time fell from 130 s to 56 s for CT, from 383 s to 60 s for MRI, and from 126 s to 46 s for mammography (week 1 to week 6). Individual trajectories showed similar patterns, with the largest reductions during the early implementation phase. Subgroup analyses were limited by small sample sizes. Conclusions. Guided structured reporting was integrated into routine clinical workflows with temporal reductions in active reporting time across users and modalities, providing empirical evidence on the feasibility of workflow-integrated structured reporting in radiological practice.
de Boer, S.; Häntze, H.; Ziegelmayer, S.; van Ginneken, B.; Prokop, M.; Bressem, K. K.; Hering, A.
Show abstract
BackgroundMedical imaging, especially computed tomography and magnetic resonance imaging, is essential in clinical care of patients with renal cell carcinoma (RCC). Artificial intelligence (AI) research into computer-aided diagnosis, staging and treatment planning needs curated and annotated datasets. Across literature, The Cancer Genome Atlas (TCGA) datasets are widely used for model training and validation. However, re-annotation is often necessary due to limited access to public annotations, raising entry barriers and hindering comparison with prior work. MethodsWe screened 1915 CT scans from three TCGA-RCC databases and employed a segmentation model to annotate kidney lesion. After a meta-data-based exclusion step, we hosted a reader study with all papillary (n=56), chromophobe (n=27) and 200 randomly selected clear cell RCC cases. Two students quality checked and corrected the data as well as annotated tumors and cysts. Uncertain cases were checked by a board-certified radiologist. ResultsAfter data exclusion and quality control a total of 142 annotated CT scans from 101 patients (26 female, 75 male, mean age 56 years) remained. This includes 95 CTs with clear cell RCC, 29 with papillary RCC and 18 with chromophobe RCC. Images and voxel-level annotations of kidneys and lesions are open sourced at https://zenodo.org/records/19630298. ConclusionBy making the annotations open-source, we encourage accessible and reproducible AI research for renal cell carcinoma. We invite other researchers who have previously annotated any of these cohorts to share their annotations.
Song, E. C.; Bernstein, M. H.; Sheppard, B.; Bruno, M. A.; Baird, G. L.
Show abstract
Background: With growing impetus to integrate artificial intelligence (AI) tools into radiology, clinical practices must navigate workflow redesign. This carries implications for medical malpractice liability. Methods: We conducted an online vignette experiment with United States adults who acted as hypothetical jurors in a malpractice case involving a missed intracranial hemorrhage. Participants (n=2,347) were randomized to one of 22 conditions: a no-AI control and 21 conditions involving a hypothetical AI system. These twenty-one conditions varied by whether (1) a single-read or double-read workflow was used, (2) the radiologist's initial interpretation was documented, (3) the radiologist changed their interpretation after viewing AI output, (4) the AI detected the abnormality, and (5) the AI error rate--False Discovery Rate (FDR) or False Omission Rate (FOR--was provided to participants only, both participants and radiologist, or neither. The primary outcome was perceived liability, assessed by whether the radiologist met their duty of care. Findings: Perceived liability differed across conditions (p<0.0001). Double-read workflows (p<0.0001), documenting initial interpretations (p=0.0125), and providing participants with AI error rates, including the FDR (p=0.0038) or FOR (p=0.0035), reduced perceived liability. Liability was also lower when AI was incorrect (p<0.0001). Radiologists' awareness of AI error rates did not significantly impact liability. Notably, we observed an erroneous change penalty: the greatest liability occurred when radiologists initially identified an abnormality but later changed their interpretation to normal after seeing that AI identified the case as normal; conversely, perceived liability was lowest with documented, double-read workflows. Interpretation: Double-read workflows with documented initial interpretations and disclosure of AI error rates reduce perceived liability, though changing a correct initial interpretation increases it. Strategic workflow design is critical for successful AI implementation that can mitigate malpractice risk.
Jean, A.; Benillouche, P.; Jacques, T.
Show abstract
This study analyzes the adoption, barriers, and expectations of French radiologists regarding the use of Artificial Intelligence (AI) solutions in their daily practice. Despite a recognition of AI's potential to make radiology more precise, predictive, and personalized, its adoption remains limited. The main obstacles identified are the high cost of those solutions and the insufficient equipment of French imaging centers with AI technologies. Nevertheless, the survey reveals a strong willingness to adopt, with over 70% of radiologists expressing their desire to use AI and 0% declaring a refusal to use it. Furthermore, the radiologists' fears of being replaced by AI are very low (0 to 8.8%).
Gunwhy, E. R.; Kurugol, S.; Serai, S.; van der Molen, A. J.; Abou El-Ghar, M.; Buckley, D. L.; Hockings, P. D.; Jones, R. A.; Lim, R. P.; Mendichovszky, I. A.; Pedersen, M.; Reynolds, H. M.; Sanmiguel Serpa, L. C.; Wentland, A.; Zoellner, F. G.; Sourbron, S.; Dekkers, I. A.
Show abstract
BackgroundDynamic contrast-enhanced (DCE) MRI has the potential to be a useful tool for non-invasively assessing renal haemodynamics and function, however insufficient standardisation and difficulties in post-processing remain barriers to clinical translation. PurposeTo develop expert consensus-based technical recommendations for performing renal DCE-MRI in humans, relating to aspects of patient preparation, MRI hardware and acquisition parameters, and data analysis. Study TypeSystematic consensus process using an approximation to the two-step modified Delphi method. PopulationNot applicable. Field Strength / Sequence1.5 T and 3 T / Renal gradient echo-based 3D DCE-MRI. AssessmentAn international panel of experts were recruited and surveyed following a modified Delphi method to create consensus-based technical recommendations. Key areas for consensus were initially identified through a mixture of online and in-person discussions, and an initial survey round consisting of open- and close-ended questions. Consensus statements were formulated and iteratively refined to create the final recommendations. Statistical TestsConsensus was defined as [≥] 75% agreement in response (excluding abstentions), and clear preference was defined as [60-74]% agreement among the experts. Statements with [≥]40% abstentions were either excluded from subsequent survey rounds or recirculated as a modified statement. Results22 experts initially participated in the Delphi panel, of which 16 responded to the first survey. 15 panellists responded to all subsequent surveys. Out of 46 statements, 37 reached consensus and one showed clear preference. [≥]40% abstention was found in seven statements which were excluded from the final set of recommendations. Data conclusionThese recommendations provide a starting point for MRI centres worldwide wishing to perform renal DCE-MRI, contributing to the harmonisation of DCE-MRI scan protocols and facilitating clinical translation. These recommendations provide a practical minimum technical dataset for renal DCE-MRI acquisition and analysis to improve cross-site comparability and support responsible clinical translation.
Alqaderi, H.; Kapadia, U.; Brahmbhatt, Y.; Papathanasiou, A.; Rodgers, D.; Arsenault, P.; Cardarelli, J.; Zavras, A.; Li, H.
Show abstract
BackgroundDental caries and periodontal disease represent the most prevalent global oral health conditions, collectively affecting several billion people. The diagnostic interpretation of dental radiographs, a cornerstone of modern dentistry, is associated with considerable inter-observer variability. In routine clinical practice, clinicians are required to evaluate a high volume of radiographic images daily, a cognitively demanding task in which diagnostic fatigue, time constraints, and the inherent complexity of overlapping anatomical structures can lead to the inadvertent oversight of early-stage pathologies. Artificial intelligence (AI) offers a transformative opportunity to augment clinical decision-making by providing rapid, objective, and consistent radiographic analysis, thereby serving as a tireless adjunct capable of flagging findings that may be missed during routine human inspection. MethodsThis study developed and validated a deep learning system for the automated detection of dental caries and alveolar bone loss using a dataset of 1,063 periapical and bitewing radiographs. Two separate YOLOv8s object detection models were trained and evaluated using a rigorous 5-fold cross-validation methodology. To align with the clinical use-case of a screening tool where high sensitivity is paramount, a custom image-level evaluation criterion was employed: a true positive was recorded if any predicted bounding box had a Jaccard Index (IoU) > 0 with any ground truth annotation. Model performance was systematically evaluated at confidence thresholds of 0.10 and 0.05. ResultsAt a confidence threshold of 0.05, the caries detection model achieved a mean precision of 84.41% ({+/-}0.72%), recall of 85.97% ({+/-}4.72%), and an F1-score of 85.13% ({+/-}2.61%). The alveolar bone loss model demonstrated exceptionally high performance, with a mean precision of 95.47% ({+/-}0.94%), recall of 98.60% ({+/-}0.49%), and an F1-score of 97.00% ({+/-}0.46%). ConclusionThe YOLOv8-based models demonstrated high accuracy and high sensitivity for detecting dental caries and alveolar bone loss on periapical radiographs. The system shows significant potential as a reliable automated assistant for dental practitioners, helping to improve diagnostic consistency, reduce the risk of missed pathology, and ultimately enhance the standard of patient care.
Ludwig, K. D.; Hatt, C. R.; Keith, L.; Matyga, A. W.; Te, H. S.; Landeras, L.; Chelala, L.; Patel, A. R.; Chung, J. H.
Show abstract
Objective: Coronary artery calcification (CAC) assessment for cardiovascular risk stratification is traditionally achieved using ECG-gated computed tomography (CT). Automated deep-learning (DL) algorithms may streamline opportunistic CAC detection and scoring, particularly on non-gated CT scans. This study evaluated the performance of a fully automated DL-based CAC scoring algorithm ("DL-CAC") against expert human scoring. Methods: The algorithm was trained on 1,260 chest CT scans from multiple databases to automatically identify coronary calcium, calculate Agatston scores, and assign a cardiovascular disease (CVD) risk classification. Performance was assessed on a holdout dataset (n=500) comprising ECG-gated calcium scoring CT scans and lung cancer screening non-gated chest CTs as well as in an external, independent CT dataset (n=129) from liver transplant candidates. Agreement with expert scoring was assessed using intraclass correlation coefficient (ICC) for Agatston scores and Cohen's {kappa} for CVD risk classification. Results: The algorithm demonstrated high agreement with expert scoring in the pooled calcium scoring and lung cancer screening cohorts, with an ICC of 0.947 for Agatston scores and {kappa} of 0.936 for CVD risk classification. For liver transplant candidates, the algorithm exhibited substantial agreement with expert scoring of non-gated CT scans ({kappa}=0.79) and a sensitivity of 90.4% and specificity of 96.4% in high-risk cases. Conclusion: These findings suggest that DL-based CAC scoring on non-gated CT scans may be a feasible alternative to traditional methods and could support opportunistic cardiovascular risk assessment in routine imaging. Further validation is warranted to assess clinical integration in broader practice settings.
Tang, W.; Dong, Y.; Chen, J.; Yang, Y.; Huang, H.; Yu, M.; Zhu, J.; Shen, G.
Show abstract
Background. Tethered cord syndrome (TCS) is classically associated with a low-lying conus medullaris, yet many surgically treated children have a normally positioned conus (occult TCS). Large-scale normative data on conus position in children, and the diagnostic value of quantitative conus assessment, are limited. Purpose. To establish a large-cohort reference distribution for conus medullaris termination level in children, to quantify conus position in children surgically treated for presumed (occult) TCS, and to test whether automated conus segmentation and radiomics can distinguish TCS from normal. Materials and Methods. In this retrospective single-center study, conus termination level was extracted from structured radiology reports of consecutive pediatric lumbosacral MRI examinations and encoded numerically (L1 = 1, L2 = 2, etc.). Children surgically treated for tethered cord were identified by linkage to an operative registry (name and date of birth) and restricted to preoperative examinations. A deep-learning model (nnU-Net) was trained for conus segmentation on axial T2-weighted images. IBSI-compliant radiomic features were extracted; reproducibility was assessed by intra- and inter-observer intraclass correlation (ICC). A case-control radiomics analysis used batch-only ComBat harmonization and cross-validated L1-penalized logistic regression; discrimination was compared with conus level by paired bootstrap. Results. Among 9,808 examinations with a parseable conus level (98.5% of reports; parser validated against dual blinded annotation, 99.4% agreement, weighted kappa 0.946), the conus terminated in the L1 region in 85.7% and the L2 region in 14.3% of the reference cohort (postoperative examinations excluded, n = 9,655); a low-lying conus (>=L3) occurred in only 0.05% (5/9,655), and remained rare (0.14%, 14/9,808) including operated examinations (median L1; mean 1.13 +/- 0.33). A slightly more cephalad position was seen with increasing age (negligible correlation). Among 475 preoperative children surgically treated for tethered cord, 99.6% had a normally positioned conus (<=L2) and only 0.4% were low-lying. Automated conus segmentation achieved a held-out Dice of 0.85. Conus radiomics likewise did not distinguish TCS from controls (equivalence-tested null; full segmentation/radiomics pipeline reported in the companion methodological paper). Conclusion. In children, the conus medullaris terminates at L1-L2 in more than 99% of cases and is normally positioned in virtually all children surgically treated for TCS. Within the conus, neither position nor texture (radiomics) identifies tethered cord; whether the filum terminale carries a diagnostic signal was not tested here.
Chen, J.; Shi, D.; Su, J.; Huang, X.; Qian, Y.
Show abstract
The severity stratification of carpal tunnel syndrome (CTS) relies on ultrasound morphological markers and electromyography. However, it remains unclear how structural imaging can reliably infer functional impairment. Clarifying the structure-function relationship is critical for efficient diagnostic pathways. A retrospective cohort of 55 patients with symptoms related to CTS was analyzed at the Shanghai Sixth Peoples Hospital. All patients were subjected to ultrasound and EMG. 72.7% cases were diagnosed with CTS with a female predominance and equal left-right involvement. Random-forest classifiers were trained using surrogate splits, and performance was evaluated using predictions outside the bag. A full-feature model (34 candidate variables) was compared against a simplified model (8 core variables) capturing the core morphological and electrophysiological features. A residual-based framework was then used to characterize the structure-function mismatch within severity grades (1a-3c). The simplified model improved discriminative performance compared to the full-feature model (AUC 0.789 to 0.824). The simplified model achieved an overall accuracy of 77.3%. Analysis of predicted probability distributions and 10-bin calibration curves indicated stable and clinically interpretable risk estimation in most probability ranges. Permutation-based importance analysis confirmed that both ultrasound and electrophysiological features contributed substantively to prediction. Residual-based grading further revealed structure- function heterogeneity within each main severity grade. CTS severity can be stratified using a limited set of complementary morphological and electrophysiological features. Structure-function mismatch supports an imaging-led initial screening, with electrophysiology reserved for selected patients.
Rashed, M.; Alabdulrahman, H.
Show abstract
Background Automated pelvic CT segmentation has advanced to reliable coarse bone extraction. Yet the structured anatomical hierarchy required for morphometry, fixation planning, bone quality mapping, and arthroplasty workflows remains unachieved. This study developed and validated a fully automated anatomy-informed pipeline that converts standard pelvic CT into a comprehensive, surgeon-readable subsegmentation of the pelvis and proximal femur. Methods Pelvic CT datasets were retrospectively collected from anonymized archives of hospitals affiliated with the Directorate of Health Affairs, Sharqia, Egypt. After eligibility screening, 757 normal adult cases were processed using a custom one-click 3D Slicer pipeline integrating TotalSegmentator for coarse extraction, followed by deterministic anatomy-based subsegmentation into 81 segments. One hundred randomly selected cases were validated against expert-corrected reference segmentations using Dice similarity coefficient, volume difference, surface distance metrics, and bilateral symmetry analysis. Results Of 1,316 screened cases, 757 met eligibility criteria. Across 8,100 case-segment observations, the pipeline achieved a mean Dice of 0.9926 +/- 0.0465. Complete agreement was observed for the sacrum, ilium, acetabulum, anterior and posterior columns, sciatic buttress, and all landmarks. Relative decreases were confined to boundary-dependent regions. Bilateral symmetry analysis confirmed a median surface agreement of 99.85% within 5 mm. Conclusion The pipeline demonstrated high accuracy and reproducibility across a large normal adult dataset, establishing a structured anatomical foundation for quantitative pelvic analysis and surgical planning workflows. Clinical feasibility across abnormal anatomy and decision-level applications awaits dedicated validation.
Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.
Show abstract
Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.
Singh, V.; Jhamb, A.; Sil, S.; Kumar, S.; Agrawal, C.; Pareek, A.; Gautam, A.; Parale, G.; Singh, S.; Padmanabhan, D.
Show abstract
BackgroundA critical radiologist shortage exists in India, leading to delayed chest radiograph (CXR) interpretation. This leads to disease progression, higher morbidity, and mortality. Artificial intelligence-based CXR interpretation by Lenek Intelligent Radiology Assistant (LIRA) is a promising solution. This study aims to establish the screening and triaging capabilities of LIRA by assessing its accuracy in detecting abnormalities and pathologies in CXRs from geographically diverse institutions. MethodsWe conducted a retrospective multi-source validation of the diagnostic accuracy of LIRA for the detection of general abnormalities, tuberculosis, consolidation, pleural effusion, pneumothorax, and cardiomegaly. De-identified chest radiographs were input into LIRA models. The obtained interpretations were compared to the established ground truth reporting for the calculation of sensitivity, specificity, and AUROC with 95% CI for individual pathologies across varying probability thresholds. ResultsLIRA demonstrated high sensitivity for general abnormality detection (AUROC 0.93-0.986, 84.4-97.1% sensitivity, 88.9-92.4% specificity) and tuberculosis triaging (Shenzhen & Montgomery: 88.5-89.7% sensitivity, 89.9-90.5% specificity; Jaypee: 98.7% sensitivity, 63.6% specificity). For consolidation (AUROC 0.884-0.895, 96.4-96.9% sensitivity, 70.8-77.1% specificity), pleural effusion (AUROC 0.942-0.967, 79.7-99.1% sensitivity, 81.2-87.7% specificity), pneumothorax (AUROC 0.87, 90.6-94.8% sensitivity, 79.5-82.7% specificity) and cardiomegaly (AUROC 0.883, 95.1% sensitivity, 81.6% specificity), the model exhibited commendable accuracy as well. ConclusionsThe diagnostic performance of LIRA was consistent across various pathologies and chest radiographs from diverse geographic locations, with particular strengths in abnormality detection and tuberculosis screening. The risk-stratified triaging and high sensitivity of LIRA make it a reliable adjunct solution to address radiologist shortages, reduce turnaround times, and support Indias tuberculosis elimination goals.
Sparnon, E.; Stevens, K.; Song, E.; Harris, R. J.; Strong, B. W.; Bruno, M. A.; Baird, G. L.
Show abstract
The present study evaluates the real-world clinical predictive performance of FDA-authorized artificial intelligence (AI) devices used in radiology, focusing on the false positive paradox (FPP) and its implications for clinical practice. To do this, we analyzed publicly available FDA data on AI radiology devices from 2024 and 2025 from 510(k) summaries, demonstrating how diagnostic accuracy metrics like sensitivity and specificity do not necessarily translate into high positive predictive value (PPV) due to the influence of target disease prevalence. We show the importance of disclosing the false discovery (FDR) and false omission rates (FOR) and argue that this transparency enables clinicians to select AI systems that balance false positive and false negative costs in a clinically, ethically, and financially appropriate manner. Finally, we provide recommendations for what data should be provided to best serve practices and radiologists.
Deckers, Q.; Uniken Venema, S. M.; Braun, K.; van der Zwan, B.; Deckers, P. T.; Siero, J. C. W.; Bhogal, A.
Show abstract
BackgroundIntracranial steno-occlusive disease (SOD) assessment benefits from hemodynamic imaging, but comprehensive evaluation often relies on contrast- or radiation-based techniques. Arterial spin labeling (ASL) provides a non-invasive alternative for quantifying tissue-level perfusion and cerebrovascular reactivity, yet does not capture upstream arterial flow dynamics. As a result, non-invasive assessment of macrovascular hemodynamics for SOD remains limited. This study evaluates whether quantitative 4D-MRA provides complementary arterial information beyond established ASL-derived metrics. MethodsTwelve SOD patients (7 women; age 42.3{+/-}25.8 years) underwent multi-delay ASL and 4D-MRA before and after acetazolamide. Cerebrovascular reactivity (CVR), arterial transit time (ATT), macrovascular ATT (mATT), and labeled blood volume (LBV) were quantified. Associations and vasodilatory responses were assessed using linear mixed-effects models. ResultsAt baseline, mATT correlated with ATT ({beta}=0.66{+/-}0.08, p<0.001). Both decreased following acetazolamide (mATT: 1.07{+/-}0.03s to 1.01{+/-}0.03s, p=0.029; ATT: 1.63{+/-}0.07s to 1.40{+/-}0.07s, p<0.001). However, changes in mATT and ATT were not associated with CVR. In contrast, CVR was positively associated with {Delta}LBV ({beta}=8.84, SE=2.43, p=0.01). Case analyses further demonstrated artery-level delayed inflow and vascular steal. ConclusionQuantitative 4D-MRA provides complementary macrovascular information to ASL in SOD. {Delta}LBV more consistently reflects cerebrovascular reactivity than transit-based metrics and is sensitive to artery-level delayed inflow and vascular steal. The local Medical Ethical Review Committee declared that the Medical Research Involving Human Subjects Act (WMO) did not apply (internal trial nr. 21-406).
Golshani, P.; Joseph, M. S.
Show abstract
ObjectiveTo characterize the magnitude and geographic distribution of commercially negotiated hospital facility rates for fourteen common interventional radiology (IR) procedures using publicly posted Hospital Price Transparency Machine-Readable Files (MRFs), and to describe the relationships between state-level commercial pricing, population rurality, and within-system rate uniformity. MethodsIn this cross-sectional observational analysis, we examined hospital-weighted commercial rate observations from U.S. hospital MRFs for fourteen IR procedures spanning image-guided drainage, embolization, peripheral vascular intervention, dialysis access maintenance, and percutaneous spine. The unit of analysis was one observation per distinct negotiated rate per state-CPT cell, deduplicating multi-facility same-system reporting in which two or more hospitals posted identical rate, range, and payer-count tuples. Outliers were excluded using transparent absolute and CMS-relative bounds. State-level statistics were computed where [≥]5 distinct hospital-system observations were reported. Commercial rates were compared to CY 2026 CMS Outpatient Prospective Payment System (OPPS) facility payments. Relationships between state-level commercial rate and 2020 U.S. Census percent-rural population were assessed by Spearman rank correlation. ResultsAcross 14 procedures, state-level commercial median rates varied 3.7-to 8.3-fold between the highest- and lowest-priced states. The largest spreads were observed for fem-pop angioplasty (CPT 37224, 8.3-fold), fem-pop atherectomy (37225, 8.1-fold), and iliac stenting (37221, 7.1-fold). National median commercial rates ranged from 1.34x (PAE/GAE) to 3.60x (paracentesis) the corresponding CMS OPPS facility payment. Across all 14 procedures, the relationship between state percent-rural and median commercial rate was negative (mean Spearman {rho} = -0.46, range -0.33 to -0.80; 14 of 14 codes negative), with the most-rural quartile of states showing a median commercial rate 42% below the most-urban quartile. Deduplication identified 660 multi-facility groups in which a single negotiated rate was applied across two or more affiliated hospitals within a state. DiscussionSubstantial state-level variation in commercially negotiated facility rates exists for common IR procedures, with consistently lower rates in more rural states. Within-system rate uniformity is a frequent feature: many regional health systems post identical commercial rates across multiple owned facilities. The findings are consistent with prior literature linking commercial pricing to market structure and support continued investment in price transparency as a precondition for informed decision-making.